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PROBLEM АМО 


44 Shaders are compiled іп draw calls 
—Emulating certain features in shaders 


4 Drivers keep shaders in some intermediate representation 
44 And insert additional code based on the states 
44 While compiling, everything stops 


A Number of state combinations is exponential 
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EMULATED STATES АМО 


4 Fragment shader: 
—Conversion to colorbuffer formats (RGBA32, RGBA FP16, ...) 
—Alpha-test 
—Selecting between front and back colors 
-gl FragColor 
—GL ALPHA TO ONE 
– Polygon stippling 
—Line & polygon smoothing 
— Point smoothing 
—Fragment color clamping 
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EMULATED STATES, CONT. АМО 


P Vertex shader: 
-Loading inputs from vertex buffers manually 
— Vertex color clamping 
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IDEA АМО 


4 Observation: 
— All states can be applied at the beginning or end of shaders 
– Аї link time, compile application shaders 
—At draw time, append any shader bytecode needed 


4 3 shader sections: 
—Prolog section 
- Main section (application shader) 
—Epilog section 


4 Concatenate them 


5 | SOLUTION TO SHADER RECOMPILES IN RADEONSI | SEPTEMBER 17, 2015 


FRAGMENT SHADER EPILOG АМО 


Color outputs are expected іп rO, r1, ... 


outO = r0; 
outi = pf; 
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FRAGMENT SHADER EPILOG AMD 


If we need alpha-test: 


if (!alphafunc(r@.w, alpharef)) discard; 


outO = гө; 
outi = pi; 
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FRAGMENT SHADER EPILOG AMD 


If we need color clamping: 
rð = clamp(rO, Ө, 1); 
ri = clamp(r4 РОТ); 
if (!alphafunc(r@.w, alpharef)) discard; 


outO = гө; 
outi = pi; 
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FRAGMENT SHADER EPILOG AMD 


If we need polygon stippling: 
га = clamp(rO, ©, 1); 
ri = clamp(r1, Ө, 1); 
if (!alphafunc(r@.w, alpharef)) discard; 
if (texture2D(stipple, gl FragCoord.xy / 32).x < @.5) discard; 
outð = гө; 
outi = aad 
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FRAGMENT SHADER EPILOG AMD 


If we need smoothing: 
rð = clamp(rO, ©, 1); 
ri = clamp(r1, Ө, 1); 
if (!alphafunc(r@.w, alpharef)) discard; 
if (texture2D(stipple, gl FragCoord.xy / 32).x < @.5) discard; 
r@.w *= coverageMask; // popcount(gl SampleMaskIn) / gl NumSamples 


r1.w *- coverageMask; 
outO - r0; 
outi = r1; 
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FRAGMENT SHADER EPILOG AMD 


If color conversion is required: 
rð = clamp(rO, 0, 1); 
ri = clamp(r1, Ө, 1); 
if (!alphafunc(r@.w, alpharef)) discard; 


if (texture2D(stipple, gl FragCoord.xy / 32).x < @.5) discard; 
r@.w *= coverageMask; // popcount(gl SampleMaskIn) / gl NumSamples 
Г1.м *- coverageMask; 

r@.xy = vec2(packHalf2x16(r0.xy), packHalf2x16(r0.zw)); 

rl.xy = vec2(packHalf2x16(r1.xy), packHalf2x16(r1.zw)); 

outO = r0; 
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FRAGMENT SHADER EPILOG AMD 


If GL ALPHA TO ONE is enabled: 
rð = clamp(rO, ©, 1); 
ri = clamp(r1, Ө, 1); 
if (!alphafunc(r@.w, alpharef)) discard; 


if (texture2D(stipple, gl FragCoord.xy / 32).x < @.5) discard; 
r@.w *= coverageMask; // popcount(gl SampleMaskIn) / gl NumSamples 
Г1.м *- coverageMask ; 

гд.м = 1; 

r@.xy = vec2(packHalf2x16(r0.xy), packHalf2x16(rð.zw)); 

r1.xy = vec2(packHalf2x16(r1.xy), packHalf2x16(r1.zw)); 

outO = r0; 

outi = rf; 
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FRAGMENT SHADER PROLOG АМО 


P Only contains two-side color selection 


4 Decreases performance if done always 


44 3 scenarios: 
—Two-side colors are enabled: 
– Select colors based оп gl FrontFacing 
-Store them into registers го, r1 
—Two-side colors are disabled: 
—Just copy front colors into rO, r1 
—No color inputs => prolog is empty 


A Application shader should read colors from го, r1 
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COMPILING PROLOGS/EPILOGS АМО 


P Still have to be compiled in draw calls 
— Сап be slow 


4 Use an assembler instead of the compiler 
– Our ШММ backend has an assembler too 
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VERTEX SHADER INPUTS АМО 


P R600 had fetch shader 
4 Removed since GCN 


P Current implementation: 
—One buffer per input 
—Instance divisor == 0: Fetch BaseVertex + VertexID 
—Instance divisor != 0: Fetch Startlnstance + (InstancelD / instance divisor) 
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VERTEX SHADER PROLOG АМО 


P Emulate fetch shader with prolog section 
—Drawback: can’t move loads to hide latencies, register usage 


P instead, only calculate load addresses: 
—Prolog writes the addresses to rO,r1, ... 
— Main shader section executes the loads 
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VERTEX SHADER EPILOG? АМО 


4 Radeon has 3 ways to write VS outputs: 
—For rasterizer 
—For geometry shader 
—For tessellation control shader 


4 Don't use an epilog 
4 OpenGL sometimes knows which shader follows 
44 If not, compile all 3 variants with З threads in parallel 


> Piglit only: Compile on demand іп draw calls 


44 Vertex color clamping: use conditional assignment 
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MESA STATE TRACKER АМО 


P NMiddle-end, translates shaders from GLSLIR into TGSI 
> Does that in draw calls 


P State dependencies for draw calls: 
— Center vs sample interpolation 
—Instead, select coordinates with conditional assignment 
—Vertex and fragment color clamping 
-GL rendering context 


P Апу dependencies should be dealt with in drivers 


P Other drivers will benefit too 
—GLSL->TGSI always done at link time 


18 | SOLUTION TO SHADER RECOMPILES IN RADEONSI | SEPTEMBER 17, 2015 


IF GAMES COMPILE TOO LATE АМО 


44 Compiling at link time doesn't help 
P Use shader cache 


4 1 shader variant => shader cache in core Mesa 


А If games compile early => don't need it 
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SKIP MESA OPTIMIZATIONS? АМО 


P Our ПУМ backend сап do most optimizations 
—Мо need to do them in Mesa 


4 Mesa/GLSL passes we do need: 
– Demoting inputs/outputs to local variables (dead code elimination?) 
— Function inlining 
— Breaking built-in input/output arrays into variables 
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АМО 


Questions? 


21 | SOLUTION TO SHADER RECOMPILES IN RADEONSI | SEPTEMBER 17, 2015 


ÆÐ 


ь “ 


THANK YOU. 


DISCLAIMER & ATTRIBUTION АМО 


Тһе information presented іп this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. 


The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product 
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the 
right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. 


AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. 


AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL 
DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 
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